31 research outputs found

    Assessment of Lead, Zinc and Cadmium Contamination in the Fruit of Palestinian Date Palm Cultivars Growing at Jericho Governorate

    Get PDF
    Phoenix dactylifera L. fruits was studied to assess whether the fruits were safe for human consumption and evaluating the date fruit as a bio-monitor of heavy metals pollution in Palestine. Hence, current research explored the toxic heavy metals (Pb, Cd, and Zn) levels in thirty-five date varieties collected from three locations (NARC, DH and ADS) of Jericho by applying anatomic absorption spectrometry. Mean values of heavy metals were calculated and expressed. the concentrations of heavy metals in date fruits flesh part were relatively higher as compared with the concentration of fruit washing residue. Heavy metals in the date palm fruits collected from NARC station (in the city center) reveals that the values are higher than ADS and DH stations (far away from the city center) due to higher human activity and higher vehicular traffic. Results of this study, reveals that most of studied heavy metals are within safe limit with respect to maximum allowable levels (MAL) in some date cultivars. Keywords: Phoenix dactylifera L., Heavy metals, Lead, Zinc, Cadmium, Jericho. DOI: 10.7176/JBAH/10-2-02 Publication date: January 31st 202

    Cytotoxic Activity of Cyclamen Persicum Ethanolic Extract on MCF-7, PC-3 and LNCaP Cancer Cell Lines

    Get PDF
    It is important to develop new approaches to increase the efficacy of cancer treatments. Nowadays, the uses of natural products to treat cancer are very common. In addition, working with plants that are endemic to Palestine and determining the biological activities of these plant extracts, is extremely important due to the potential for new drug development. Cyclamen persicum is used in traditional medicinal to treat anti-rheumatic, diarrhea, abdominal pains, edema, abscesses, eczema, cancer and other ailments. In this study the cytotoxic effect of C. persicum tubers and leaves ethanolic extracts were studied against MCF-7, PC-3 and LNCaP cancer cell lines, using mitochondrial dehydrogenase enzyme method. Results showed the remarkable cytotoxic activity of C. persicum extracts, against breast and prostate adenocarcinoma. For tubers extract the IC50 value was found to be 0.05 mg/ml for the three cell lines. Although the leaves extract the IC50 value was found to be 0.25 mg/ml for PC-3 and MCF-7 cell lines, while LNCaP cell inhibition were less than 30% at all tested leaves extract concentrations. MCF-7 cells exhibited the highest sensitivity to the C. persicum extracts, compared to PC-3 and LNCaP cell lines evaluated. In contrast, LNCaP cells generally exhibited the lowest sensitivity to extracts. These results displayed that C. persicum is a good source for natural products with antitumor compounds that can be further exploited for the development of a potential therapeutic anticancer agent. Key words: Cyclamen persicum, Cytotoxicity, MTT assay, LNCaP, MCF-7, PC-3. DOI: 10.7176/JNSR/10-2-05 Publication date: January 31st 202

    An AUC-based Permutation Variable Importance Measure for Random Forests

    Get PDF
    The random forest (RF) method is a commonly used tool for classification with high dimensional data as well as for ranking candidate predictors based on the so-called random forest variable importance measures (VIMs). However the classification performance of RF is known to be suboptimal in case of strongly unbalanced data, i.e. data where response class sizes differ considerably. Suggestions were made to obtain better classification performance based either on sampling procedures or on cost sensitivity analyses. However to our knowledge the performance of the VIMs has not yet been examined in the case of unbalanced response classes. In this paper we explore the performance of the permutation VIM for unbalanced data settings and introduce an alternative permutation VIM based on the area under the curve (AUC) that is expected to be more robust towards class imbalance. We investigated the performance of the standard permutation VIM and of our novel AUC-based permutation VIM for different class imbalance levels using simulated data and real data. The results suggest that the standard permutation VIM loses its ability to discriminate between associated predictors and predictors not associated with the response for increasing class imbalance. It is outperformed by our new AUC-based permutation VIM for unbalanced data settings, while the performance of both VIMs is very similar in the case of balanced classes. The new AUC-based VIM is implemented in the R package party for the unbiased RF variant based on conditional inference trees. The codes implementing our study are available from the companion website: http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/070_drittmittel/janitza/index.html

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Dealing with Missing Data and Uncertainty in the Context of Data Mining

    Get PDF
    Missing data is an issue in many real-world datasets yet robust methods for dealing with missing data appropriately still need development. In this paper we conduct an investigation of how some methods for handling missing data perform when the uncertainty increases. Using benchmark datasets from the UCI Machine Learning repository we generate datasets for our experimentation with increasing amounts of data Missing Completely At Random (MCAR) both at the attribute level and at the record level. We then apply four classification algorithms: C4.5, Random Forest, Naïve Bayes and Support Vector Machines (SVMs). We measure the performance of each classifiers on the basis of complete case analysis, simple imputation and then we study the performance of the algorithms that can handle missing data. We find that complete case analysis has a detrimental effect because it renders many datasets infeasible when missing data increases, particularly for high dimensional data. We find that increasing missing data does have a negative effect on the performance of all the algorithms tested but the different algorithms tested either using preprocessing in the form of simple imputation or handling the missing data do not show a significant difference in performance

    Predicting postoperative complications for gastric cancer patients using data mining

    Get PDF
    Gastric cancer refers to the development of malign cells that can grow in any part of the stomach. With the vast amount of data being collected daily in healthcare environments, it is possible to develop new algorithms which can support the decision-making processes in gastric cancer patients treatment. This paper aims to predict, using the CRISP-DM methodology, the outcome from the hospitalization of gastric cancer patients who have undergone surgery, as well as the occurrence of postoperative complications during surgery. The study showed that, on one hand, the RF and NB algorithms are the best in the detection of an outcome of hospitalization, taking into account patients’ clinical data. On the other hand, the algorithms J48, RF, and NB offer better results in predicting postoperative complications.FCT - Fundação para a Ciência e a Tecnologia (UID/CEC/00319/2013

    Predicting disease risks from highly imbalanced data using random forest

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.</p> <p>Methods</p> <p>We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases.</p> <p>Results</p> <p>We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process.</p> <p>Conclusions</p> <p>In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.</p

    Automated time activity classification based on global positioning system (GPS) tracking data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Air pollution epidemiological studies are increasingly using global positioning system (GPS) to collect time-location data because they offer continuous tracking, high temporal resolution, and minimum reporting burden for participants. However, substantial uncertainties in the processing and classifying of raw GPS data create challenges for reliably characterizing time activity patterns. We developed and evaluated models to classify people's major time activity patterns from continuous GPS tracking data.</p> <p>Methods</p> <p>We developed and evaluated two automated models to classify major time activity patterns (i.e., indoor, outdoor static, outdoor walking, and in-vehicle travel) based on GPS time activity data collected under free living conditions for 47 participants (N = 131 person-days) from the Harbor Communities Time Location Study (HCTLS) in 2008 and supplemental GPS data collected from three UC-Irvine research staff (N = 21 person-days) in 2010. Time activity patterns used for model development were manually classified by research staff using information from participant GPS recordings, activity logs, and follow-up interviews. We evaluated two models: (a) a rule-based model that developed user-defined rules based on time, speed, and spatial location, and (b) a random forest decision tree model.</p> <p>Results</p> <p>Indoor, outdoor static, outdoor walking and in-vehicle travel activities accounted for 82.7%, 6.1%, 3.2% and 7.2% of manually-classified time activities in the HCTLS dataset, respectively. The rule-based model classified indoor and in-vehicle travel periods reasonably well (Indoor: sensitivity > 91%, specificity > 80%, and precision > 96%; in-vehicle travel: sensitivity > 71%, specificity > 99%, and precision > 88%), but the performance was moderate for outdoor static and outdoor walking predictions. No striking differences in performance were observed between the rule-based and the random forest models. The random forest model was fast and easy to execute, but was likely less robust than the rule-based model under the condition of biased or poor quality training data.</p> <p>Conclusions</p> <p>Our models can successfully identify indoor and in-vehicle travel points from the raw GPS data, but challenges remain in developing models to distinguish outdoor static points and walking. Accurate training data are essential in developing reliable models in classifying time-activity patterns.</p
    corecore